[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test by AndreasKaratzas · Pull Request #37616 · vllm-project/vllm

AndreasKaratzas · 2026-03-19T23:57:24Z

Follow-up for:

[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline #34839

Stabilizes Cohere test that was failing to due batch invariance issues on ROCm. Addresses failure in mi325_1: Entrypoints Integration (Pooling)

Motivation: https://buildkite.com/vllm/amd-ci/builds/6701/steps/canvas?sid=019d07a7-1a2e-4d29-91e7-9eb765bc4904&tab=output

[Feature][Scheduler] Add split prefix caching feature to eliminate bf16 GEMM tiling divergence across cache-hit/miss paths #34046
[Bug][ROCm]: Prefix caching produces different output on first request (cache miss) vs subsequent requests (cache hit) #33123

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request aims to fix a flaky test for Cohere/OpenAI embedding parity on ROCm by adding ROCM_EXTRA_ARGS to the test server's configuration. This introduces arguments to disable prefix caching and limit the maximum number of sequences to one on ROCm platforms. While this change successfully stabilizes the test, I have a concern that limiting sequences to one effectively disables batch processing, which undermines the purpose of the test_batch_parity test. My review includes a suggestion to handle this more explicitly to maintain test integrity.

tests/entrypoints/pooling/embed/test_cohere_openai_parity.py

AndreasKaratzas · 2026-03-20T00:01:07Z

Testing MI325 to see if issue is resolved (added rocm and ready labels).

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-03-20T17:25:52Z

Test has been confirmed green: https://buildkite.com/vllm/amd-ci/builds/6732/steps/canvas?sid=019d0bd9-2a24-4529-a7c3-4c16a3f66397&tab=output

DarkLight1337 · 2026-03-21T03:57:20Z

vllm/entrypoints/pooling/base/serving.py

        await self._prepare_generators(ctx)
-        await self._collect_batch(ctx)
+        try:
+            await self._collect_batch(ctx)


Why is this needed? We now use app level error handlers to convert error responses

@DarkLight1337 The app-level Exception handler at api_server.py:270 handles dimensions=-1 correctly (returns 400), but for the immediately following dimensions=16 request on the same connection, the same ValueError from pooling_params.verify() escapes the Starlette ExceptionMiddleware and crashes the ASGI app. The client gets APIConnectionError instead of BadRequestError.

https://buildkite.com/vllm/amd-ci/builds/6711/steps/canvas?sid=019d088b-f229-4de8-923d-b4c48a62c6fb&tab=output

cc @andyxning

…enai

…agation Signed-off-by: Andreas Karatzas <akaratza@amd.com>

noooop

thanks!

noooop · 2026-03-25T10:05:45Z

tests/entrypoints/pooling/embed/test_online_dimensions.py

        for dimensions in [-1, 16]:
            with pytest.raises(openai.BadRequestError):
                await make_request_and_correctness_test(dimensions)


May I ask why this test can pass on the NVIDIA GPU?

Or did this test not pass on the NVIDIA GPU?

In my understanding, NVIDIA GPUs should also suffer from the issue below.

but for the immediately following dimensions=16 request on the same connection, the same ValueError from pooling_params.verify() escapes the Starlette ExceptionMiddleware and crashes the ASGI app. The client gets APIConnectionError instead of BadRequestError.

The reason the observed issue surfaces more times on ROCm is likely timing. Slower engine processing widens the window where the async generator cleanup and the ServerErrorMiddleware re-raise interact with the keep-alive connection state. On NVIDIA the race window is narrower, so the test may pass consistently or fail only intermittently. That's what I think it's going on.

That's eye-opening.

How were you able to spot and fix a race condition bug? LOL

Oh didn't spot it in an exact line inside our stack 😅 I mostly emphasize on the network sluggishness part of our infra these days so I am guessing this is what is going on.

cc @DarkLight1337 @andyxning

I have a bad feeling that

some exceptions won't be caught by the app-level error handlers?

Or will the async main loop catch exceptions from another coroutine?

cc @yihong0618

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Michel Belleau <michel.belleau@malaiwah.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test

de7d9d7

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify bot added the rocm Related to AMD ROCm label Mar 19, 2026

github-project-automation bot added this to AMD Mar 19, 2026

github-project-automation bot moved this to Todo in AMD Mar 19, 2026

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

tests/entrypoints/pooling/embed/test_cohere_openai_parity.py Show resolved Hide resolved

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test

21ad38c

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify bot added the frontend label Mar 20, 2026

AndreasKaratzas marked this pull request as ready for review March 20, 2026 17:25

AndreasKaratzas requested a review from noooop as a code owner March 20, 2026 17:25

AndreasKaratzas requested a review from DarkLight1337 March 20, 2026 21:43

DarkLight1337 reviewed Mar 21, 2026

View reviewed changes

AndreasKaratzas added 2 commits March 25, 2026 03:52

Merge remote-tracking branch 'origin/main' into akaratza_fixcohere_op…

24835c8

…enai

Validate pooling params early to avoid async generator exception prop…

3b81963

…agation Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested a review from njhill as a code owner March 25, 2026 09:32

noooop approved these changes Mar 25, 2026

View reviewed changes

noooop enabled auto-merge (squash) March 25, 2026 09:56

noooop reviewed Mar 25, 2026

View reviewed changes

noooop merged commit f262a62 into vllm-project:main Mar 25, 2026
49 of 50 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 25, 2026

AndreasKaratzas deleted the akaratza_fixcohere_openai branch March 25, 2026 10:57

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

5c0e685

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

8357608

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

malaiwah pushed a commit to malaiwah/vllm that referenced this pull request Mar 27, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

63264c7

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Michel Belleau <michel.belleau@malaiwah.com>

nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

86cbf2b

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

a373246

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

d550609

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

dbc8ad7

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

5da3c40

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

49d7fd2

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

9cb820d

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (vllm-projec…

29db0a2

…t#37616) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas mentioned this pull request Apr 13, 2026

Fix sf_error(NULL) race condition under concurrent open failures bastibe/python-soundfile#480

Open

Uh oh!

Conversation

AndreasKaratzas commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

AndreasKaratzas commented Mar 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndreasKaratzas Mar 25, 2026 •

edited

Loading